Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Identifieur interne : 001686 ( Main/Exploration ); précédent : 001685; suivant : 001687

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Auteurs : Raphael Fonteneau [Belgique] ; Susan A. Murphy [États-Unis] ; Louis Wehenkel [Belgique] ; Damien Ernst [Belgique]

Source :

RBID : Pascal:13-0216765

Descripteurs français

English descriptors

Abstract

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="fr" level="a">Strategies d'échantillonnage pour l'apprentissage par renforcement batch</title>
<author>
<name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName>
<settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
<author>
<name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Université du Michigan</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName>
<settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
<author>
<name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName>
<settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">13-0216765</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0216765 INIST</idno>
<idno type="RBID">Pascal:13-0216765</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000063</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000944</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000029</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000029</idno>
<idno type="wicri:doubleKey">0992-499X:2013:Fonteneau R:strategies:d:echantillonnage</idno>
<idno type="wicri:Area/Main/Merge">001700</idno>
<idno type="wicri:Area/Main/Curation">001686</idno>
<idno type="wicri:Area/Main/Exploration">001686</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="fr" level="a">Strategies d'échantillonnage pour l'apprentissage par renforcement batch</title>
<author>
<name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName>
<settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
<author>
<name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Université du Michigan</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName>
<settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
<author>
<name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName>
<settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Action</term>
<term>Active system</term>
<term>Artificial intelligence</term>
<term>Learning algorithm</term>
<term>Model matching</term>
<term>Optimal control</term>
<term>Optimal control (mathematics)</term>
<term>Optimal policy</term>
<term>Reinforcement learning</term>
<term>Sampling</term>
<term>State space</term>
<term>State space method</term>
<term>Supervised learning</term>
<term>System identification</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Apprentissage renforcé</term>
<term>Intelligence artificielle</term>
<term>Action</term>
<term>Système actif</term>
<term>Apprentissage supervisé</term>
<term>Commande optimale</term>
<term>Contrôle optimal</term>
<term>Politique optimale</term>
<term>Echantillonnage</term>
<term>Algorithme apprentissage</term>
<term>Identification système</term>
<term>Ajustement modèle</term>
<term>Méthode espace état</term>
<term>Espace état</term>
<term>.</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Intelligence artificielle</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Belgique</li>
<li>États-Unis</li>
</country>
<region>
<li>Province de Liège</li>
<li>Région wallonne</li>
</region>
<settlement>
<li>Liège</li>
</settlement>
<orgName>
<li>Université de Liège</li>
</orgName>
</list>
<tree>
<country name="Belgique">
<region name="Région wallonne">
<name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
</region>
<name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
</country>
<country name="États-Unis">
<noRegion>
<name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001686 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001686 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:13-0216765
   |texte=   Strategies d'échantillonnage pour l'apprentissage par renforcement batch
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022